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Abstract 

We study error bounds for linear programming decoding of regular LDPC codes. For mem- 
oryless binary-input output-symmetric channels, we prove bounds on the word error probabil- 
ity that are inverse doubly-exponential in the girth of the factor graph. For memoryless binary- 
input AWGN channel, we prove lower bounds on the threshold for regular LDPC codes whose 
factor graphs have logarithmic girth under LP-decoding. Specifically, we prove a lower bound 
of cr = 0.735 (upper bound of || = 2.67dB) on the threshold of (3, 6)-regular LDPC codes 
whose factor graphs have logarithmic girth. 

Our proof is an extension of a recent paper of Arora, Daskalakis, and Steurer [STOC 2009] 
who presented a novel probabilistic analysis of LP decoding over a binary symmetric chan- 
nel. Their analysis is based on the primal LP representation and has an explicit connection to 
message passing algorithms. We extend this analysis to any MBIOS channel. 
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1 Introduction 



Low-density parity-check (LDPC) codes were invented by Gallager HGal63H in 1963. Gallager also 
invented the first type of message-passing iterative decoding algorithm, known today as the sum- 
product algorithm for a-posteriori probability (APP) decoding. Until the 1990s, iterative decoding 
systems were forgotten with a few exceptions such as the landmark paper of Tanner HTan81H in 
1981, who founded the study of codes defined by graphs. LDPC codes were rediscovered IIMN961 
after the discovery of turbo-codes HBGT93H . LDPC codes have attracted a lot of research atten- 
tion since empirical studies demonstrate excellent decoding performance using iterative decoding 
methods. Among the main results is the density-evolution technique for analyzing and designing 
asymptotic LDPC codes HRU01L A density-evolution analysis computes a threshold for the noise. 
This means that if the noise in the channel is below that threshold, then the decoding error dimin- 
ishes exponentially as a function of the block length. The threshold results of HRU01H hold for a 
random code from an ensemble of LDPC codes. 

Feldman et al. HFel03[ IFWK05B suggested a decoding algorithm for linear codes that is based 
on linear programming. Initially, this idea seems to be counter-intuitive since codes are over F£, 
whereas linear programming is over R n . Following ideas from approximation algorithms, linear 
programming (LP) is regarded as a fractional relaxation of an integer program that models the 
problem of decoding. One can distinguish between integral solutions (vertices) and non-integral 
vertices of the LP. The integral vertices correspond to codewords, whereas the non-integral vertices 
are not codewords and are thus called pseudo-codewords. This algorithm, called LP-decoding, 
has two main advantages: (i) it runs in polynomial time, and (ii) when successful, LP-decoding 
provides an ML-certificate, i.e., a proof that its outcome agrees with maximum-likelihood (ML) 
decoding. 

Koetter and Vontobel showed that LP-decoding is equivalent to graph cover decoding HVK05L 
Abstractly, graph cover decoding proceeds as follows. Given a received word, graph cover de- 
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coding considers all possible M-covers of the Tanner graph of the code (for every integer M). 
For every M-cover graph, the variables are assigned M copies of the received word. Maximum- 
likelihood (ML) decoding is applied to obtain a codeword in the code corresponding to the M- 
cover graph. The "best" ML-decoding result is selected among all covers. This lifted codeword 
is then projected (via averaging) to the base Tanner graph. Obviously, this averaging might yield 
a non-integral solution, namely, a pseudo-codeword as in the case of LP-decoding. Graph cover 
decoding provides a combinatorial characterization of LP-decoding and pseudo-codewords. 

LP-decoding has been applied to several codes, among them: RA codes, turbo-like codes, 
LDPC codes, and expander codes. Decoding failures have been characterized, and these charac- 
terizations enabled proving word error bounds for RA codes, LDPC codes, and expander codes 



(see e.g., HFK041 IHE051 IKV061 IFS051 IFMS+071 IDDKW081 IADS 091). Experiments indicate that 



message-passing decoding is likely to fail if LP-decoding fails HFel03[|VK05ll . 



1.1 Previous Results 



Feldman et al. llFMS + 07ll were the first to show that LP-decoding corrects a constant fraction of 
errors for expander codes over an adversarial bit flipping channel. For example, for a specific fam- 
ily of rate | LDPC expander codes, they proved that LP-decoding can correct 0.000175n errors. 
This kind of analysis is worst-case in its nature, and the implied results are quite far from the per- 
formance of LDPC codes observed in practice over binary symmetric channels (BSC). Daskalakis 
et al. HDDKW08H initiated an average-case analysis of LP-decoding for LDPC codes over a prob- 
abilistic bit flipping channel. For a certain family of LDPC expander codes over a BSC with bit 
flipping probability p, they proved that LP-decoding recovers the transmitted codeword with high 
probability up to a noise threshold of p = 0.002. This proved threshold for LP-decoding is rather 
weak compared to thresholds proved for belief propagation (BP) decoding over the BSC. For ex- 
ample, even for (3, 6)-regular LDPC codes, the BP threshold is p = 0.084, and one would expect 



LDPC expander codes to be much better under LP-decoding. Both of the results in ||FMS + 07M and 



HDDKW08I were proved by analysis of the dual LP solution based on expansion arguments. Ex- 
tensions of ||FMS + 07| to a larger class of channels (e.g., truncated AWGN channel) were discussed 
in HFKV05H . 

Koetter and Vontobel HKV06H analyzed LP-decoding of regular LDPC codes using girth argu- 
ments and the dual LP solution. They proved lower bound on the threshold of LP-decoding for 
regular LDPC codes whose Tanner graphs have logarithmic girth over any memoryless channel. 
This bound on the threshold depends only on the degree of the variable nodes. The decoding errors 
for noise below the threshold decrease doubly-exponentially in the girth of the factor graph. This 
was the first threshold result presented for LP-decoding of LDPC codes over memoryless channels 
other than the BSC. When applied to LP-decoding of (3, 6)-regular LDPC codes over a BSC with 
crossover probability p, they achieved a lower bound of p = 0.01 on the threshold. For the binary- 
input additive white Gaussian noise channel with noise variance a 2 (BI-AWGN(a)), they achieved 
a lower bound of a = 0.5574 on the threshold (equivalent to an upper bound of = 5.07dB). The 
question of closing the gap to a = 0.82 (1.7dB) HWA01II . which is the threshold of max-product 
(min-sum) decoding algorithm for the same family of codes over a BI-AWGNC(cr), remains open. 

Recently, Arora et al. HADS09II presented a novel probabilistic analysis of the primal solution 
of LP-decoding for regular LDPC codes over a BSC using girth arguments. They proved error 
bounds that are inverse doubly-exponential in the girth of the Tanner graph and lower bounds on 
thresholds that are much closer to the performance of BP-based decoding. For example, for a 
family of (3, 6) -regular LDPC codes whose Tanner graphs have logarithmic girth over a BSC with 
crossover probability p, they proved a lower bound of p = 0.05 on the threshold of LP-decoding. 
Their technique is based on a weighted decomposition of every codeword and pseudo-codeword 
to a finite set of structured trees. They proved a sufficient condition, called local-optimality, for 
the optimality of a decoded codeword based on this decomposition. They use a min-sum process 
on trees to bound the probability that local-optimality holds. A probabilistic analysis of the min- 
sum process is applied to the structured trees of the decomposition, and yields error bounds for 
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LP-decoding. 

In a following work, Vontobel HVonlOU generalized the geometrical aspects presented by Arora 
et al. HADS09I to any code defined by a factor graph. Vontobel considered the general setup of 
factor graphs with (i) non-uniform node degrees, (ii) with other types of constraint function nodes, 
and (iii) with no restriction on the girth. Vontobel constructed a weighted decomposition of every 
codeword and pseudo-codeword to a finite set of structured combinatorial entities. 

1.2 Our Contribution 

In this work, we extend the analysis in HADS09H from the BSC to any memoryless binary-input 
output-symmetric (MBIOS) channel. We prove bounds on the word error probability that are 
inverse doubly-exponential in the girth of the factor graph for LP-decoding of regular LDPC codes 
over MBIOS channels. We also prove lower bounds on the threshold of (cLl, df?) -regular LDPC 
codes whose Tanner graphs have logarithmic girth under LP-decoding in binary-input AWGN 
channels. Note that regular Tanner graphs with logarithmic girth can be constructed explicitly (see 
e.g. HGal63IO . Specifically, in a finite length analysis of LP-decoding over BI-AWGN(cr), we prove 
that for (3, 6) -regular LDPC codes the decoding errors for a < 0.605 (jg > 4.36dB) decrease 
doubly-exponentially in the girth of the factor graph. In an asymptotic case analysis, we prove a 
lower bound of a = 0.735 (upper bound of ^ = 2.67dB) on the threshold of (3, 6)-regular LDPC 
codes under LP-decoding, thus decreasing the gap to the BP-based decoding asymptotic threshold. 

In our analysis we utilize the combinatorial interpretation of LP-decoding via graph cov- 
ers HVK05H to simplify some of the proofs in HADS09II . Specifically, using the equivalence of 
graph cover decoding and LP-decoding in HVK05I . we obtain a simpler proof that local-optimality 
suffices for LP optimality. 

Our main result: 

Theorem 1. Let G denote a (di, dn)-regular bipartite graph with girth g, and letC{G) C {0, l} n 
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denote the low-density parity-check code defined by G. Let x G C{G) be a codeword. Consider 
the BI-AWGNC(cr), and suppose that y G R n is the word obtained from the channel given x. Then, 

1) [finite length bound] For (dr,, dn) = (3, 6) and a ^ 0.605 (j^^ 4.36d5), x is the unique 
optimal solution to the LP decoder with probability at least 

1 e^n ■ c 

125 

for some constant c < 1. 

2) [asymptotic bound] For (di, dn) = (3, 6) and g = f2(logn) sufficiently large, x is the unique 
optimal solution to the LP decoder with probability at least 1 — exp(— rC)for some constant 

< 7 < 1, provided that a ^ 0.735 (|| ^ 2.Q7dB). 

3) For any (d^, d^), x is the unique optimal solution to the LP decoder with probability at least 

1 — n ■ c^ 1 -" 1 ) 13 ^ for some constant c < 1, provided that 

min{(V R -l)e-' J°° (l-F N {z)) dR - 2 f^e^d^ ■ (idn-^e^-^ ^ " } < 1, 

where fj^(-) and -FV(-) denote the p.d.f. and c.d.f. of a Gaussian random variable with zero 
mean and standard deviation a, respectively. 

Theorem \T\ generalizes to MBIOS channels as follows. 

Theorem 2. Let G denote a (di, dji)-regular bipartite graph with girth f2(logn), and let C(G) C 
{0, 1}" denote the low-density parity-check code defined by G. Consider an MBIOS channel, and 
suppose that y G R™ is the word obtained from the channel given x = n . Let A G R denote the 
log-likelihood ratio of the received channel observations, and let f\(-) and F\(-) denote the p.d.f. 
and c.d.f. of\(yi), respectively. Then, LP '-decoding succeeds with probability at least ^1— exp(— n 7 ) 
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for some constant < 7 < 1, provided that 

roo \ / \ !/(<*£ -2)' 



mm 



l f(d R - 1) J°° (1 - F A (^)) d ^ 2 /A(^)e- fe ^ • ((d R - l)Ee"' A ^) ' } < 1. 



The remainder of this paper is organized as follows. Section [2] provides some background on 
low-density parity check codes and linear programming decoding over memory less channels. Sec- 
tion [3] presents combinatorial characterization of a sufficient condition of LP-decoding success for 
regular LDPC codes in memoryless channels. In section 0] we use the combinatorial characteri- 
zation to bound the error probability of LP-decoding and provide lower bounds on the threshold. 
Thus proving Theorems \T\ and [2l We conclude with a discussion in Section [51 



2 Preliminaries 

Low-density parity-check codes and factor graph representation. A code C with block length 
n over F 2 is a subset of F£ . Vectors in C are referred to as codewords. An [n, k] binary linear code 
is a &;-dimensional vector subspace of the vector space FJ. A parity-check matrix for an [n, k] 
binary linear code C is an m x n matrix H with rank(H) = n — k ^ m whose rows span the space 
of vectors orthogonal to C. 

The factor graph representation of a code C is a bipartite graph G that represents the matrix H. 
The factor graph G is over variable nodes Vl — {1, . . . , n} and check nodes Vr = {1, . . . , m}. An 
edge (i, j) connects variable node i and check node j if H 3 ^ = 1. The variable nodes correspond 
to bits of the codeword and the check nodes correspond to the rows of H. Every bipartite graph 
defines a parity check matrix. If the bipartite graph is (d L , d R ) -regular for some constants di and 
d R , then it defines a (e?£, dR) -regular low-density parity-check (LDPC) code. 



'That is, a bipartite graph with left vertices of degree and right vertices of degree cLr. 
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LP decoding over memoryless channels. Let Xi e {0,1} and F, e R denote random variables 
that correspond to the 2th transmitted symbol (channel input) and the 2th received symbol (channel 
output), respectively. A memoryless binary-input output-symmetric (MBIOS) channel is defined 
by a conditional probability density function jy^xXVil %i) — fiXi = Ui/Xi = xi) that satisfies 
fYi/XiiUi/O) = fYi/Xi(—yi/l)- The log-likelihood ratio (LLR) vector A 6 R n for a received word 
y E R n is defined by 

Ai(j/j) = In 



_ fYt/XiiVi/O) 



for % e {1, . . . , n}. For a linear code C, Maximum-Likelihood (ML) decoding is equivalent to 

£ ML (y)=arg min (X(y),x), (1) 

a;Sconv(C) 

where conv(C) denotes the convex hull of the set C, where C is considered to be embedded in R n 
in the natural way. 

Solving in general the optimization problem in ([I]) for linear codes is intractable. Furthermore, 
the decision problem of ML decoding remains NP-hard even for the class of left-regular LDPC 
codes HXH07L Feldman et al. HFel03[|FWK05ll introduced a linear programming relaxation for the 
problem of ML decoding of linear codes. Given a factor graph G, for every j e Vr, denote by Cj 
the set of binary sequences that satisfy parity check constraint j, 

Cj = {x G F™ : x i = (mod2)}. 

Let V(G) = n j& v R conv(Cj) denote the fundamental polytope HFel031 IFWK031 IVK051 of a factor 
graph G. For LDPC codes whose Tanner graphs have constant bounded right degree and a linear 
number of edges, the fundamental polytope can be defined by a linear number of constraints. Given 
an LLR vector A for a received word y, LP-decoding consists of solving the following optimization 
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problem 

x LP (y) = arg^rmn (\(y),x), (2) 

which can be solved in time polynomial in n using linear programming. 

Let us denote by BI-AWGNC(cr) the binary input additive white Gaussian noise channel with 
noise variance a 2 . The channel input X t at time i is an element of {±1} since we map a bit 
be {0, 1} to Given X u the channel outputs Y { = X { + fa where fa ~ Af(0, a 2 ). For BI- 

AWGNC(cr), \i(yi) = -%t. Note that the optimal ML and LP solutions are invariant under positive 
scaling of the LLR vector A. 

3 On the Connections between Local Optimality, Global Opti- 
mally, and LP Optimality 

Let x E C(G) denote a codeword and \(y) E R n denote an LLR vector for a received word 
y E R n . Following HADS09L we consider two questions: (i) does x equal x ML (y)l and (ii) does 
x equal x LP (y) and is it the unique solution? Arora et al. HADS09I presented a certificate based 
on local structures both for x ML (y) and x LP (y) over a binary symmetric channel. In this section 
we present modifications of definitions and certificates to the case of memory less binary-input 
output- symmetric (MBIOS) channels. 

Notation: Let y E R n denote the received word. Let A = X(y) denote the LLR vector for 
y. Let x E C(G) be a candidate for x ML (y) and x LP (y). G is a (<i L , <i J j)-regular bipartite factor 
graph. For two vertices u and v, denote by d(u, v) the distance between u and v in G. Denote by 
M{v) the set of neighbors of a node v, and let B(u, t) denote the set of vertices at distance at most 
t from u. 

Following Arora et al. we consider neighborhoods B(i , 2T) where i E V^andT < ±girth(G). 
Note that the induced graph on B(i , 2T) is a tree. 
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Definition 3 (Minimal Local Deviation, HADS0910 . An assignment (3 E {0, l} n is a valid deviation 
of depth T at i E Vl or, in short, a T-local deviation at i , if f3 io = 1 and (3 satisfies all parity 
checks in B(i , 2T), 

Vj E V R n B(i , 2T): Pi = mod 2. 

ieN(j) 

A T-local deviation f3 at io is minimal if (3i = for every i ^ B(i ,2T), and every check node j 
in B(io, 2T) has at most two neighbors with value 1 in (3. A minimal T-local deviation at io can be 
seen as a subtree ofB(i , 2T) of height 2T rooted at i , where every variable node has full degree 
and every check node has degree 2. Such a tree is called a skinny tree. An assignment (3 E {0, l} n 
is a minimal T-local deviation if it is a minimal T-local deviation at some io- Note that given f3 
there is a unique such io = root(/3). 

Ifw — (wi, . . . , wt) E [0, 1] T is a weight vector and (3 is a minimal T-local deviation, then 
(3^ denotes the unweighted deviation 



w t /3i if d(root((3) , i) = 2t and 1 ^ t ^ T, 
otherwise. 



The following definition expands the notion of addition of codewords over F2 to the case where 
one of the vectors is real. 

Definition 4 ( HFel031l ). Given a codeword x E {0, l} n and a point f E [0, l] n , the relative point 

x © / E [0, l] n is defined by (x © /), = \x{ — f\. 

Note that 

(x © f)i = 

Hence, for a fixed x E {0, l} n , x © / is an affine linear function in /. It follows that for any 
distribution over vectors / E [0, 1]™, we have E[x © /] = x © E[/]. 
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Given a log-likelihood ratio vector A, the cost of a w-weighted minimal T-local deviation (3 
is defined by (A, f3^). The following definition is an extension of local-optimality from BSC to 
LLR. 

Definition 5 (local-optimality following HADS09I0 . A codeword x G {0,1}™ is (T, w) -locally 
optimal for A G R. n if for all minimal T-local deviations j3, 

(\,x®(3 {w) ) > (A,x). 

Since (3^ G [0, l] n , we consider only weight vectors w G [0, 1] T \{0™}. Koetter and Vontobel 
HKV06II proved for w = 1 T that a locally optimal codeword x for A is also globally optimal, i.e., 
the ML codeword. Moreover, they also showed that a locally optimal codeword x for A is also the 
unique optimal LP solution given A. Arora et at HADS09M used a different technique to prove that 
local-optimality is sufficient both for global optimality and LP optimality with general weights in 
the case of a binary symmetric channel. We extend the results of Arora et al. HADS09II to the case 
of MBIOS channels. Specifically, we prove for MBIOS channels that local-optimality implies LP 
optimality (Theorem [9]). We first show how to extend the proof that local-optimality implies global 
optimality in the case of MBIOS channels. 

Theorem 6 (local-optimality is sufficient for ML). Let T < \girth{G) and w G [0, 1] T . Let 
A G R n denote the log-likelihood ratio for the received word, and suppose that x G {0, l} n is 
a (T, w)-locally optimal codeword in C{G) for A. Then x is also the unique maximum-likelihood 
codeword for A. 

The proof for MBIOS channels is a straightforward modification of the proof in HADS09H . We 
include it for the sake of self-containment. The following lemma is the key structural lemma in the 
proof of Theorem [6] 
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Lemma 7 ( HADS09I0 . Let T < ~girth(G). Then, for every codeword z ^ O n , there exists a 
distribution over minimal T-local deviations (3 such that, for every weight vector w G [0, 1} T , 
there exists an a G (0, 1], such that 

E^/3 (w) = az. 

Proof of Theorem® We want to show that for every codeword x' ^ x, (\,x') > (X,x). Since 
z = x © x' is a codeword, by Lemma[7]there exists a distribution over minimal T-local deviations 
f3 such that E^/3^ = az. Let / : [0, l] n ->■ R be the affine linear function defined by f(u) = 
(A, x © u) = (A, x) + E^=i(-l) x< AiUi- Then, 

(A,x) < E /3 (A,x © (by local-optimality of x) 

= (A, x © (by linearity of / and linearity of expectation) 

= (A, a; © az) (by Lemma|7]) 

= (A, (1 — a)x + a(x © z)) 

= (A, (1 — a)x + ax') 

= (1 — a)(X,x) + a(X,x'). 

which implies that (A, x') > (A, x) as desired. 

In order to prove a sufficient condition for LP optimality, we consider graph cover decoding 
introduced by Vontobel and Koetter [VK05]. We use the terms and notation of Vontobel and 
Koetter HVK05I in the statement of Lemma [8] and the proof of Theorem [9] (see Appendix |A)) . The 
following lemma shows that local-optimality is preserved after lifting to an M-cover. Note that the 
weight vector must be scaled by the cover degree M. 

Lemma 8. Let T < \girth(G) and w G [0, jj} T \{0 n }- Let G denote any M-cover ofG Suppose 
that x G C(G) is a (T,w) -locally optimal codeword for A G H n . Let x = x^ M G C(G) and 
A = A tM G R n ' M denote the M-lifts of x and X, respectively. Then x is a (T,M ■ w)-locally 
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optimal codeword for A. 

Proof. Assume that x = x^ M is not a (T, M ■ u>)-locally optimal codeword for A = A^ A/ . Then, 
there exists a minimal T-local deviation (3 E {0, l} n M such that 

(~\,x®fr M - w) ) < (\,x). (3) 

Note that for x E {0, l} ra A/ and its projection x = p(x) E R n , it holds that 

M$,x) = and (4) 

±{\,x®p( M ^) = (\,x®pW), (5) 

where (3 is the support of the projection of (3 onto the base graph. It holds that (3 is a T-local 
deviation because T < jgirth(G) ^ \girth(G). From ©, ©, and © we get that (A, a;) ^ 
(A, x © /?^), contradicting our assumption on the (T, u;)-local optimality of x. Therefore, a; is a 
(T, M ■ w)-locally optimal codeword for A in C(G). □ 

Arora et al. nADS09H proved the following theorem for a BSC and w E [0, 1] T . The proof 
can be extended to the case of MBIOS channels with w E [0, 1] T using the same technique of 
Arora et al. A simpler proof is achieved for w E [0, -^] T for some finite M. The proof is based 
on arguments utilizing properties of graph cover decoding HVK05H . and follows as a corollary of 
Theorem [6] and Lemma [U 

Theorem 9 (local-optimality is sufficient for LP optimality). For every factor graph G, there exists 
a constant M such that, if 

1. T < \girth(G), 

2. we [0, jj} T \{0 T }, and 

3. x is a (T, w)-locally optimal codeword for A E R n , 
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then x is also the unique optimal LP solution given A. 

Proof. Suppose that x is a (T, u;)-locally optimal codeword for A £ R n . Vontobel and Koetter 
HVK05II proved that for every basic feasible solution z £ [0, l] n of the LP, there exists an M -cover 
G of G and an assignment z £ {0, l} n M such that z £ C(G) and z = p(z), where p(z) is the 
image of the scaled projection of z in G (i.e., the pseudo-codeword associated with z). Moreover, 
since the number of basic feasible solutions is finite, we conclude that there exists a finite M-cover 
G such that every basic feasible solution of the LP admits a valid assignment in G. 

Let z* denote an optimal LP solution given A. Without loss of generality z* is a basic feasible 
solution. Let z* £ {0, l} n A/ denote the — 1 assignment in the M-cover G that corresponds 
to z* £ [0, l] n . By the equivalence of LP-decoding and graph cover decoding HVK051 . ©, and 
the optimality of z* it follows that z* is a codeword in C(G) that minimizes (A, z) for z £ C(G), 
namely z* = x ML (%p M ). 

Let x = x^ M denote the M-lift of x. Note that because x is a codeword, i.e., x £ {0, 1}™, 
there is a unique pre-image of x in G, which is the A/ -lift of x. Lemma [8] implies that a; is a 
(T, M ■ u>)-locally optimal codeword for A, where M ■ w £ [0, 1] T . By Theorem [6l we also get 
that x = x ML (y^ M ). Moreover, Theorem [6] guarantees the uniqueness of an ML optimal solution. 
Thus, x = z*. By projection to G, since x = z*, we get that x = z* and uniqueness follows, as 
required. □ 

From this point, let M denote the constant whose existence is guaranteed by Theorem [9] 

4 Proving Error Bounds Using Local Optimality 

In order to simplify the probabilistic analysis of algorithms for decoding linear codes over symmet- 
ric channels, one can assume without loss of generality that the all-zero codeword was transmitted, 
i.e., x = n . Note that the correctness of the all-zero assumption depends on the employed de- 
coding algorithm. Although this assumption is trivial for ML decoding because of the symmetry 
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of a linear code C(G), it is not immediately clear in the context of LP-decoding. Feldman et 
al. HFel03llFWK05H noticed that the fundamental polytope V(G) is highly symmetric, and proved 
that for binary-input output- symmetric channels, the probability that the LP decoder fails is in- 
dependent of the transmitted codeword. Therefore, one can assume that x = n when analyzing 
LP-decoding failure for linear codes. The following lemma gives a structural characterization for 
the event of LP-decoding failure if x = n . 

Lemma 10. Let T < \girth{G). Assume that the all-zero codeword was transmitted, and let 
A G R n denote the log-likelihood ratio for the received word. If the LP decoder fails to decode to 
the all-zero codeword, then for every w G R+ there exists a minimal T -local deviation f3 such that 

(X,^ w) ) < 0. 

Proof. Consider the event where the LP decoder fails to decode the all-zero codeword, i.e., n is 
not a unique optimal LP solution. Theorem [9] implies that there exists a constant M such that, for 
every w' G [0, ^j] T \{0 T }, the all-zero codeword is not the (T, u/)-locaily optimal codeword for A. 
That is, there exists a minimal T-local deviation f3 such that (A, ^ w '} ^ 0. Let w' = M .^ w ^ • w. 
Therefore (A, (3^) is also non-positive, as required. □ 

We therefore have for a fixed T < ^girth(G) and w G that 

P{LP decoding fails} < P{3/3 such that (A,/3 (w) ) ^ 0\x = n }. (6) 

4.1 Bounding Processes on Trees 

Using the terminology of ©, Arora et al. HADS09II suggested a recursive method for bounding the 
probability ¥{3/3 such that (A, (3 {w) ) ^ 0|z = n } for a BSC. We extend this method to MBIOS 
channels and apply it to a BI-AWGN channel. 

Let G be a (d L , rf J? )-regular bipartite factor graph, and fix T < \girth(G). Let T Vo denote the 
subgraph induced by B(v , 2T) for a variable node v . Since T < \girth{G), it follows that % is 
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a tree. We direct the edges of T Vo so that it is an out-branching directed at the root v (i.e., a rooted 
spanning tree with directed paths from the root v to all the nodes). For I G {0, . . . , 2T}, denote 
by Vi the set of vertices of % a at height / (the leaves have height and the root has height 2T). Let 
t C V(% ) denote the vertex set of a skinny tree rooted at v . 

Definition 11 ((T, w)-Process on a (d L , G^)-Tree, HADS09I0 . Let u G R+ denote a weight vector. 
Let A denote an assignment of real values to the variable nodes ofT VQ , we define the u-weighted 
value of a skinny tree r by 

T-l 

val^r; A) = ^ ^ LOi- X v . 

i=o vemv 2 i 

Namely, the sum of the values of variable nodes in r weighted according to their height. 
Given a probability distribution over assignments A, we are interested in the probability 

U XjdL7dR (T,uj) 4 P A | mmvaL(r; A) < J. (7) 

In other words, U. x ,d L ,d R (T, oj) is the probability that the minimum value over all skinny trees 
of height 2T rooted in some variable node vq in a (dL, dn) -bipartite graph G is non-positive. For 
every two roots v and v { the trees T vo and % 1 are isomorphic, it follows that Ux t d L ,d R (T, oj) does 
not depend on the root v . 

Since A is a random assignment of values to variable nodes in % , Arora et al. refer to 
min rC 7; o val u (T] A) as a random process. With this notation, we apply a union bound utilizing 
Lemma [TOl as follows. 

Lemma 12. Let G be a (di, dn) -regular bipartite graph and w G be a weight vector with T < 
^girth(G). Suppose that A G R" is the log-likelihood ratio of the word received from the channel. 
Then, the transmitted codeword x = n is (T, a ■ w)-locally optimal for a — (M ■ \ \w\ |oo) _1 with 
probability at least 

1 - n ■ Ux,d L ,d R (T, oj), where u t = w T -i, 
16 



and with at least the same probability, x = 0™ is also the unique optimal LP solution given A. 

Note the two different weight notations: (i) w denotes weight vector in the context of weighted 
deviations, and (ii) cu denotes weight vector in the context of skinny subtrees in the (T, w)-Process. 
A one-to-one correspondence between these two vectors is given by ui = wt-i for ^ I < T. 
From this point on, we will use only to. 

Following Lemma [T2l it is sufficient to estimate the probability Hx,d L ,d R (T,u) for a given 
weight vector cu, a distribution of a random vector A, and degrees (dL,df>). We overview the 
recursion presented in HADS09H for estimating and bounding the probability of the existence of a 
skinny tree with non-positive value in a (T, w)-process. 

Let {7} denote an ensemble of i.i.d. random variables. Define random variables X , . . . , Ar-i 
and Yq, . . . , Y T _x with the following recursion: 

Y = ujoI (8) 
X l = minl^,...,^ -15 } (0^/<T) (9) 

Y l = u n + X£\ + . . . + x\ d X 1] (0</<T) (10) 

The notation X^\ . . . , X^ and Y^\ . . . , Y^ denotes d mutually independent copies of the ran- 
dom variables X and Y, respectively. Each instance of YJ, ^ I < T, uses an independent instance 
of a random variable 7. 

Consider a directed tree T = % of height 2T, rooted at node v . Associate variable nodes 
of T at height 21 with copies of Y h and check nodes at height 2/ + 1 with copies of X h for 
^ I < T. Note that any realization of the random variables {7} to variable nodes in T can be 
viewed as an assignment A. Thus, the minimum value of a skinny tree of T equals Ya=i Xt-v 
This implies that the recursion in (l8l)-(fT0l) defines a dynamic programming algorithm for comput- 
ing min rcr val u (r; A). Now, let the components of the LLR vector A be i.i.d. random variables 
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distributed identically to {7}, then 

U X4l4r (T,u) = p{ f>« , < 0}. (11) 

Given a distribution of {7} and a finite "height" T, it is possible to compute the distribution of 
Xi and Yi according to the recursion in (l8l)-(fl0l) using properties of a sum of random variables and 
a minimum of random variables (see Appendix lB.il) . The following two lemmas play a major role 
in proving bounds on U x , dL ,d R (T, u). 

Lemma 13 ( HADS09I ). For every t ^ 0, 

U XtdL4R (T,u)^{Ee- tx ^) dL . 

Let d' L = dh — 1 and d' R = d R — 1. 
Lemma 14 ( HADS09II ). For < s < I < T, we have 

{ \ d 'L l ~ S l ~ S ~ 1 k 

Based on these bounds, in the following subsection we present concrete bounds on Tl\ t( i L ,d R (T, u) 
for BI-AWGN channel. 



4.2 Analysis for BI-AWGN Channel 

Consider the binary input additive white Gaussian noise channel with noise variance a 2 denoted by 
BI-AWGNC(cr). In the case that the all-zero codeword is transmitted, the channel input is Xi = +1 
for every i. Hence, \BI-awgnc(o-) _ ^ w h ere <■/>. ^ Af(Q, a 2 ). Since U.x,d L ,d R (T, uj) is 

invariant under positive scaling of the vector A, we consider in the following analysis the scaled 
vector A in which X { = 1 + fa with fa ~ W(0, a 2 ). 
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Following HADS09L we apply a simple analysis for BI-AWGNC(cr) with uniform weight vector 
lu. Then, we present improved bounds by using a non-uniform weight vector. 



4.2.1 Uniform Weights 

Consider the case where u = 1 T . Letci = Ee _tXo andc2 = <i^Ee _ * A % anddefinec = c\-c\ dh ~ • 
By substituting notations of c\ and c 2 in Lemmas [T3l and [141 Arora et al. HADS09H proved that if 
c < 1, then 

U X!dLtdR (T,l T )^c d ^ 



\T\ / A L -d' T - x -d L 



To analyze parameters for which ILx,d L ,d R (T, 1 T ) — > 0, we need to compute c\ and c 2 as func- 
tions of a, d L and (Ir. Note that 

X = min {Xi} 
ie{i,...,d' R } 1 

= 1+ min 6;, where 6* ~ jVYO, a 2 ) i.i.d. 
ie{i,...4' R ) 

Denote by fj\f(-) and Fj^(-) the p.d.f. and c.d.f. of a Gaussian random variable with zero mean and 
standard deviation a, respectively. We therefore have 

/oo t 
(l - F A f(x)) R ~ f A f(x)e~ tx dx, and (12) 
-oo 

c 2 ((r,d L ,d B ) = d'^ 2 * 2 - 1 . (13) 

The above calculations give the following bound on H\ t d L! d R {T, l T ). 
Lemma 15. Ifa>0 and d^,d R > 2 satisfy the condition 

(poo , \ / \ l/(d L -2) 

d> R e-t J (1 - F M (x)) dR ~ l f M {x)e- tx dx J • ( d' R e^-\ ; ) < 1, 



C2 
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then for TeW and u = 1 T , we have 

U x>dh , dR (T,uj) <^< T_1 -^. 

For (3,6)-regular graphs, we obtain by numeric calculations the following corollary. 

Corollary 16. Let a < 0.59, = 3, and d,R = 6. Then, there exists a constant c < 1 swc/z that 
for every Tel and = 1 T , 

n A)(ii)(ifl (7» < (? T . 

Note that IL^^^T, 1 T ) decreases doubly-exponentially as a function of T. 
4.2.2 Improved Bounds Using Non-Uniform Weights 

The following lemma implies an improved bound for II A ,d R (T, to) using a non-uniform weight 
vectors. 

Lemma 17. Let a > an J d L ,d>R > 2. Suppose that for some s 6 IN and some weight vector 
ue R s + , 

minEe~* Xa < ((d fl - l)e~5^)~^. (14) 

Lef e denote the concatenation of the vector to E ~R S + and the vector (p, . . . , p) e R+~ s . 
77zen, /or every T > s there exist constants c < 1 and p ^ smc/i f/za? 

U x4l4r (T,u^) < ((d« - l)e"^)"^ ■ C d ^ T ~ s ~\ 

Proof. By Lemma [T4l we have 

Ee"^- 1 < (Ee-^0 (di " 1)T "'" 1 ((dfl-l)^" tp(1+ * ) ) E ^ r2(<<i " 1)h 
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Note that Ee ^( 1+< ^ = e *p+2*V°- 2 is minimized when tp = a 2 . By setting p = we obtain 

Ee- tx — < (Ee- tx ') (di " 1)T """ 1 ((dfl- l)e"^) ^"-^ 

= (Ee-* Xs ((c/ R - l)e~i^)^- 2 I ((dfl-l)e~5^) di " 2 . 

Let c = { mini >0 Ee"* Xs ((d R - l)e"^) }. By (fl4|) . c < 1. Lett* = arg min t>0 Ee" tx % then 

Ee-**^- 1 ^^-^"'"'((tZii-lJe-^)" 3 ^ 3 . 
Using Lemma [T3l we conclude that 

and the lemma follows. □ 

Arora a/. HADS09I suggested using a weight vector cJ with components ZJi = (d L — l) 1 . This 
weight vector has the effect that if A assigns the same value to every variable node, then every level 
in a skinny tree r contributes equally to val^ij; A). For T > s, consider a weight vector w^' G 
defined by 

{ZJi if < I < s, 
p if s < / < T. 

Note that the first s components of u/ p ) are non-uniform while the other components are uniform. 

For a given er, di, and d R , and for a concrete value s we can compute the distribution of 
X s using the recursion in (l8l)-(fl0l). Moreover, we can also compute the value min^ Ee~' Xs . 
Computing the distribution and the Laplace transform of X s is not a trivial task in the case where 
the components of A have a continuous density distribution function. However, since the Gaussian 
distribution function is smooth and most of its volume is concentrated in a defined interval, it is 
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Table 1: Computed values of <To for finite s in Corollary 
measure in dB. 



and their corresponding || SNR 



possible to "simulate" the evolution of the density distribution functions of the random variables 
Xi and Yi for i ^ s. We use a numerical method based on quantization in order to represent and 
evaluate the functions /*,(•), F Xt (-), /y-,(-)> an d ^V/(")- This computation follows methods used 
in the implementation of density evolution technique (see e.g. HRU08I0 . A specific method for 
computation is described in Appendix [B] and exemplified for (3,6)-regular graphs. 
For (3, 6)-regular bipartite graphs we obtain the following corollary. 

Corollary 18. Let a < a , di = 3, and cIr = 6. For the following values ofa and s in Table\l\it 
holds that there exists a constant c < 1 such that for every T > s, 



Ti\,d L ,d R {T,u) < yTrge^ -c 2 



Note that for a fixed s, the probability Yix,d L ,d R (T, us) decreases doubly-exponentially as a func- 
tion of T. Since it's required that s < T, Corollary [T8] applies only to codes whose Tanner graphs 
have girth larger than AT. 

Theorem Q] follows from Lemma [T2l Lemma [151 and Corollary [T8] as follows. The first part, 
that states a finite-length result, follows from Lemma [T2l and Corollary [T8] by taking s = < T < 
jgirth(G) which holds for any Tanner graph G. The second part, that deals with an asymptotic 
result, follows from Lemma [121 and Corollary [T8l by fixing s = 22 and taking g = fi(logn) 
sufficiently large such that s < T = O(logn) < jgirth(G). It therefore provides a lower bound 
on the threshold of LP-decoding. The third part, that states a finite-length result for any (g^, djt)- 
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regular LDPC code, follows from Lemma [T2l and Lemma [T5l Theorem |2] is obtained in the same 
manner after a simple straightforward modification of Lemma [T5l to MBIOS channels. 
Remark: Following HADS09L the contribution cut ■ \ vo of the root of % is not included in 
the definition of voI u (t\ A). The effect of this contribution to U\ t d L ,d R (T,cu) is bounded by a 
multiplicative factor, as implied by the proof of Lemma \\3\ The multiplicative factor is bounded 
by ~Ee~ tulTXv o , which may be regarded as a constant since it does not depend on the code parameters 
(in particular the code length n). Therefore, we can set u>t = without loss of generality for these 
asymptotic considerations. 

5 Discussion 

We extended the analysis of Arora et al. HADS091 for LP-decoding over a BSC to any MBIOS 
channel. We proved bounds on the word error probability that are inverse doubly-exponential in 
the girth of the factor graph for LP-decoding of regular LDPC codes over MBIOS channels. We 
also proved lower bounds on the threshold of regular LDPC codes whose Tanner graphs have 
logarithmic girth under LP-decoding in the binary-input AWGN channel. 

Although thresholds are regarded as an asymptotic result, the analysis presented by Arora et 
al. IIADS09II , as well as its extension presented in this paper, exhibits both asymptotic results as 
well as finite-length results. An interesting tradeoff between these two perspectives is shown by 
the formulation of the results. We regard the goal of achieving the highest possible thresholds as an 
asymptotic goal, and as such we may compare the achieved thresholds to the asymptotic BP-based 
thresholds. Note that the obtained lower bound on the threshold increases up to a certain ceiling 
value (which we conjecture is below the LP threshold) as the assumed girth increases. Thus, an 
asymptotic result is obtained. 

However, in the case of finite-length codes, the analysis cannot be based on an infinite girth in 
the limit. Two phenomena occur in the analysis of finite codes: (i) the size of the interval [0, <t ] 
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for which the error bound holds increases as function of the girth (as shown in Tabled)), and (ii) 
the decoding error probability decreases exponentially as a function of the gap cr — °" (as implied 
by Figure[5tb)). We demonstrated the power of the analysis for the finite-length case by presenting 
error bounds for any (3, 6) -regular LDPC code as function of the girth of the Tanner graph provided 
that a ^ 0.605. Assuming that the girth of the Tanner graph is greater than 88, an error bound 
is presented provided that a ^ 0.735. This proof also shows that 0.735 is a lower bound on the 
threshold in the asymptotic case. 

In the proof of LP optimality (Lemma[8]and Theorem© we used the combinatorial interpreta- 
tion of LP-decoding via graph covers HVK05H to infer a reduction to conditions of ML optimality. 
That is, the decomposition of codewords presented by Arora et at HADS09I leads to a decom- 
position for fractional LP solutions. This method of reducing combinatorial characterizations of 
LP-decoding to combinatorial characterizations of ML decoding is based on graph cover decoding. 

Future directions: The technique for proving error bounds for BI-AWGN channel described in 
Section|4]and in Appendix|B]is based on a min-sum probabilistic process on a tree. The process is 
characterized by an evolution of probability density functions. Computing the evolving densities 
in the analysis of AWGN channels is not a trivial task. As indicated by our numeric computa- 
tions, the evolving density functions in the case of the AWGN channel visually resemble Gaussian 
probability density functions (see Figures [2] and [3]). Chung et al. HCRU01H presented a method for 
estimating thresholds of belief propagation decoding according to density evolution using Gaus- 
sian approximation. Applying an appropriate Gaussian approximation technique to our analysis 
may result in analytic asymptotic approximate thresholds of LP-decoding for regular LDPC codes 
over AWGN channels. 

Feldman et al. HFKV05H observed that for high SNRs truncating LLRs of BI-AWGNC surpris- 
ingly assist LP-decoding. They proved that for certain families of regular LDPC codes and large 
enough SNRs (i.e., small a), it is advantageous to truncate the LLRs before passing them to the 
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LP decoder. The method presented in Appendix |B] for computing densities evolving on trees using 
quantization and truncation of the LLRs can be applied to this case. It is interesting to see whether 
this unexpected phenomenon of LP-decoding occurs also for larger values of a (i.e., lower SNRs). 

A Graph Cover Decoding - Basic Terms and Notation 

Vontobel and Koetter introduced in HVK05H a combinatorial concept called graph-cover decoding 
(GCD) for decoding codes on graphs, and showed its equivalence to LP-decoding. The characteri- 
zation of GCD provides a useful theoretical tool for the analysis of LP-decoding and its connections 
to iterative message-passing decoding algorithms. We use the characterization of graph cover de- 
coding in the statement of Lemma [8] and the proof of Theorem [9j In the following, we define some 
basic terms and notations with respect to graph covers and graph-cover decoding. 

Let G and G be finite graphs and let ir : G — > G be a graph homomorphism, namely, Vw, v E 
V(G) : (u, v) E E(G) =>- (ir(u),ir(v)) E E(G). A homomorphism ir is a covering map if for 
every v E V(G) the restriction of n to neighbors of v is a bijection to the neighbors of tt(v). The 
pre-image n~ l {v) of a node v is called a fiber and is denoted by G v . It is easy to see that all the 
fibers have the same cardinality if G is connected. This common cardinality is called the degree or 
fold number of the covering map. If n : G — > G is a covering map, we call G the base graph and 
G a cover of G. In the case where the fold number of the covering map is M, we say that G is an 
M-cover of G. 

Given a base graph G and a natural fold number M, an M-cover G and a covering map % : 
G — V G can be constructed in the following way. Map every vertex (v, i) E V(G) (where i E 
{1, . . . , M}) to v E V(G), i.e., ir(v,i) = v. The edges in E(G) are obtained by specifying a 
matching D^ v ) of M edges between 7t _1 (m) and 7r _1 (w) for every (u, v) E E(G). 

Note that the term 'covering' originates from covering maps in topology, as opposed to other 
notions of 'coverings' in graphs or codes (e.g., vertex covers or covering codes). 
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We now define assignments to variable nodes in an M-cover of a Tanner graph. The assignment 
is induced by the covering map and an assignment to the variable nodes in the base graph. 

Definition 19 (lift, HVK05I0 . Consider a bipartite graph G = (I U J", E) and an arbitrary M- 
cover G = (IU J ,E) of G. The M-lift of a vector x E ~R N is an assignment x E H N ' M to the 
nodes in I that is induced by the assignment x E R to the nodes in I and the covering map 
n : G — > G as follows: every v E ix~ l {v) is assigned by x the value assigned to v by x. The M-lift 
of a vector x is denoted by x^ M . 

Definition 20 (pseudo-codeword, HVK05I0 . The (scaled) pseudo-codeword p(x) E associ- 
ated with binary vector x, = {xa}^ E C of length N ■ M is the rational vector p{x) = 
(pi(x),p 2 {x), ...,p N (x)) defined by 



where the sum is taken in R (not in F 2 j. 

B Computing the Evolution of Probability Densities over Trees 

In this appendix we present a computational method for estimating min iS , Ee~* Xs for some con- 
crete s. The random variable X s is defined by the recursion in (l8l)-(fT0l). Let {7} denote an ensemble 
of i.i.d. continuous random variable with probability density function (p.d.f.) / 7 (-) and cumulative 
distribution function (c.d.f.) -F 7 (-). 

We demonstrate the method for computing minfj, ~Ee~ tXs for the case where d L = 3, d R = 6, 
= (d L — 1)' = 2 l , a = 0.7, and 7 = 1 + where ~ A/"(0, a 2 ). In this case, 
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where erf (x) = J* e~ t2 dt denotes the error function. 

The actual computation of the evolution of density functions via the recursion equations re- 
quires a numeric implementation. Finding an efficient and stable such implementation is non- 
trivial. We follow methods used in the computation of the variable-node update process in the 
implementation of density evolution analysis (see e.g. HRU08I0 . 

We first state two properties of random variables for the evolving process defined in the re- 
cursion. We then show a method for computing a proper representation of the probability density 
function of X s for the purpose of finding min^o Ee~* Xs . 

B.l Properties of Random Variables 

Sum of Random Variables. Let $ denote a random variable that equals to the sum of n inde- 
pendent random variables {0j}" =1 , i.e., $ = Y27=i Denote by /^(-) the p.d.f. of 0j. Then, the 
p.d.f. of $ is given by 

/*= * Uv (16) 

ie{l,...,n} 

where * denotes the standard convolution operator over R or over Z. 

Minimum of Random Variables. Let $ denote a random variable that equals to the minimum 
of n i.i.d. random variables {</>j}™ =1 , i.e., $ = mini<;j^ n 0j. Denote by /</,(•) and i^(-) the p.d.f. 
and c.d.f. of <p ~ 4>u respectively. Then, the p.d.f. and c.d.f. of $ are given by 

U(x) = n ■ (l - F^x)) 11 f<j,(x), and (17) 
F*(z) = 1- (1-F (j) (x)) n . (18) 
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B.2 Computing Distributions of Xi and Y t 

The base case of the recursion in (l8l)-(fT0l) is given by Y . Let g Wl (-) denote the p.d.f. of the scaled 
random variable cu^, i.e., 

!L;(U) -f(-)- (19) 



Then, the p.d.f. of Y is simply written as 



fY (y)=g U0 (y). (20) 

In the case where 7 = 1+ Af(0, a 2 ), Equation (1201) simplifies to 

fvM = —M—-1), and (21) 
F Yo (y) = F u (j--iy (22) 

Let /* d (-) denote the d-fold convolution of a function /(•), i.e., the convolution of function 
/(■) with itself d times. Following (fT6l)-(fT8l). the recursion equations for the p.d.f. and c.d.f. of X t 
and Yi are given by 

fx t (x) = (d R -l){l-F Yl (x)) dR - 2 f Y Xx), (23) 
F Xl (x) = 1 - (1 - F Yl (x)) dR -\ (24) 



and (25) 



^V«(2/) = / (26) 



Since we cannot analytically solve (I23l)-(I261). we use a numeric method based on quantization 
in order to represent and evaluate the functions fxi(-), Fx t (-), /y ; (-)» an d Fy t (-). As suggested in 
HRU08I . we compute a uniform sample of the functions, i.e., we consider the functions over the 
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set 51i, where 5 denotes the quantization step size. Moreover, due to practical reasons we restrict 
the functions to a finite support, namely, {Sk}^ =M for some integers M < N. We denote the set 
{5k}k =M by S[M, N]. Obviously, the choice of 5, M, and N determines the precision of our target 
computation. Depending on the quantized function, it is also common to consider point masses 
at points not in S[M, N]%. For example, in the case where the density function has an heavy tail 
above SN we may assign the value +00 to the mass of the tail as an additional quantization point. 
The same applies analogously to a heavy tail below SM. 

A Gaussian-like function (bell-shaped function) is bounded and continuous, and so are its 
derivatives. The area beneath its tails decays exponentially and becomes negligible a few standard 
deviations away from the mean. Thus, Gaussian-like functions are amenable to quantization and 
truncation of the tails. We therefore choose to zero the density functions outside the interval 
[SM, SN]. The parameters M and N are symmetric around the mean, and together with S are 
chosen to make the error of a Riemann integral negligible. As we demonstrate by computations, 
the density functions fx,( ) an d fy t (■) are indeed bell-shaped, justifying the quantization. Figured] 
illustrates the p.d.f. of X (here X equals to the minimum of cLr — 1 = 5 instances of Y ). Note 
that by definition, Y is a Gaussian random variable. 

Computing f Yl (•) given /x,_i (■) requires the convolution of functions. However, the restriction 
of the density functions to a restricted support S[M, N] is not invariant under convolution. That 
is, if the function / is supported by S[M, N], then / * / is supported by 5[2M, 2N]. In the quan- 
tized computations of fx l (■) and /k, (•), our numeric calculations show that the mean and standard 
deviation of the random variables Xi and Y\ increase exponentially in / as illustrated in Figures [2] 
and [31 Therefore, the maximal slopes of the density functions fx t (-) and /y ( (-) decrease with I. 
This property allows us to doublaj the quantization step 5 as I increases by one. Thus, the size 
of the support used for fx t (-) and /y,(-) does not grow. Specifically, the interval S[M, N] doubles 
but the doubling of 5 keeps the number of points fixed. This method helps keep the computation 



2 Doubling applies to the demonstrated parameters, i.e. = 3 and u>i = 2 l . 
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X 

Figure 1: Probability density functions of X and Y for ((f L , d R ) = (3, 6) and a = 0.7. 



tractable while keeping the error small. 

For two quantized functions / and g, the calculation of f*g can be efficiently performed using 
Fast Fourier Transform (FFT). First, in order to prevent aliasing, extend the support with zeros 
(i.e., zero padding) so that it equals the support of / * g. Then, f*g = IFFT(FFT(/) x FFT(<?)) 
where x denotes a coordinate-wise multiplication. The outcome is scaled by the quantization 
step size 5. In fact, the evaluation of /y,(-) requires cLl — 1 convolutions and is performed in the 
frequency domain (without returning to the time domain in between) by a proper zero padding 
prior to performing the FFT. 

Note that when 7 is a discrete random variable with a bounded support (as in HADS09II ). a 
precise computation of the probability distribution function of X s is obtained by following (l23l - 
(|26l>. 
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Figure 2: Probability density functions of X\ for I — 0, . . . , 4, (d^, = (3, 6) and cr = 0.7. 




Figure 3: Probability density functions of Yi for I = 0, . . . , 4, (rfx,, g?#) = (3, 6) and a = 0.7. 
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Figure 4: In (Ee"* Xs ) as a function of t for s = 4, 6, 8, 10, 12, (d L , = (3, 6) and a = 0.7. Plot 
(b) is an enlargement of the rectangle depicted in plot (a). 



B.3 Estimating min^o tXs 

After obtaining a proper discretized representation of the p.d.f. of X s we approximate Ee~ <Xs for 
a given t by 



N 



Ee' 



52 5 ■ f x .(5k) ■ a 



tSk 



k=M 



We then estimate the minimum value by searching over values of t ^ 0. Figure 0] depicts 
In (Ee~' Xs ) as a function of t G (0, 0.5] for s = 4, 6, 8, 10, 12. The numeric calculations show 
that as t grows from zero, the function Ee~* Xs decreases to a minimum value, and then increases 
rapidly. We can also observe that both the values min^o Ee - *^ and arg min^ Ee" tXs decrease 
as a function of s. 

Following Lemma [T71 we are interested in the maximum value of a for which (fl~4"l) holds for a 
given s. That is, 



cr = sup < a > 



minEe" tXs • ((d R - l)e ^) d i" 2 < 1 



(27) 



Note that if the set in (l27l) is not empty, then it is an open interval (0, cr ) ^ Figure [5] (a) 
illustrates the region in the (t, a) plane, for which (fl~4l) holds with s = 4. 
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(a) (b) 

Figure 5: (a) Region for which 5e~2^Ee~' X4 < 1 as a function of t and a for (d L , cLr) = (3, 6). 
Note that the maximal value of a contained in that region results to the estimate of cr = 0.685 in 
the entry s = 4 in Table CD (b) Constant c in Corollary [[8] as a function of a in the case where s = 4 
and t — 0.11, i.e., the value of c over the cut of the (t, cr)-plane in plot (a) at t — 0.11 (depicted by 
a thick solid line). 

Let t* denote the value of t that achieves the supremum a . For every a E (0, o"o), we may set 
the value of the constant c in Corollary [T8l as 

c = Ee~ ttXa ■ ((d R -l)e~^)^~ 2 . 

Figure |5] (b) illustrates the value of the constant c in Corollary [T8] as a function of a in the case 
where s = 4. 
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